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Abstract 

Suppose  a sample  of  size  n is  observed  from  the 
d-dimensional  density  f.  Conditions  are  given  which 
insure  that  a single-1 inkage  clustering  algorithm 
can  asymptotically  find  the  decomposition  of  the 
support  of  f into  connected  closed  sets. 


Clustering  is  the  process  of  grouping  similar 
objects.  For  our  purposes  the  objects  to  be  grouped 
can  be  thought  of  as  a set  of  d-dimensional  vectors 
and  a clustering  algorithm  can  be  thought  of  as  any 
scheme  for  partitioning  this  set  into  subsets  called 
clusters.  Our  paper  analyzes  the  asymptotic  perfor- 
mance of  clustering  algorithms  for  a simple  probabi- 
listic model  with  the  result  that  versions  of  a single- 
linkage clustering  algorithm  are  shown  to  be  asymp- 
totically effective.  Excellent  summaries  of  previous 
work  in  clustering  are  contained  in  Hartigan  and 
Dorofeyuk* , while  a more  technical  and  thorough 
description  of  our  results  may  be  found  in  Oevroye 
and  Wagner^ . 


Let  X be  a random  vector  with 
a probability  density 


f = 


n 

?v, 


values  in  IR  and 


(1) 


where  > 0,  1 < i < 

probability  densities.  If  f^  has  support  C^-, 
1 < i < M,  then  we  assume  that 


(a) 


C^  is  connected,  1 < i < M, 


(b)  are  disjoint,  and 

(c)  C.j  is  bounded,  1 f i j M- 


(2) 


>- 

Q_ 

O 

CJ? 


The  supports  Cj,...,C|,|  may  be  thought  of  as  the  clus- 
ters chosen  by  nature.  In  particular,  if  independent 
observations  are  made  on  (1)  then  Cj,...,C|,,  determine 

a natural  partition  of  these  observations.  However, 
suppose  that  the  statistician  assumes  only  that  (1) 
and  (2)  hold  for  some  M,  and  f^.-.-.f^j  ' 

and,  in  place  of  specific  knowledge  of  f,  has  a sample 
size  n from  (1),  say  Xj X^.  The  question  that 

concerns  us  here  is  how  the  statistician  can  asymp- 
totically obtain  the  same  grouping  of  observations  on 
(1)  as  he  would  If  he  knew  Cj,...,C|,|. 

From  the  sample  Xj,...,X^,  the  statistician  will, 
for  his  clustering  algorithm,  construct  a partition 


, ,A^  of  IR 


Future  observations,  that  is. 


observations  from  (1)  which  are  independent  of  those 
in  his  sample,  will  then  be  grouped  together  if  they 
fall  in  the  same  set  A^.  For  this  reason,  we  shall 
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also  refer  to  the  sets  A, 


In  the 

concentration  is  focused 


,A|^  as  clusters. 


,X„  and  the  sets  of  the 
n 


are 


r- 

vast  clustering  literature, 
on  grouping  the  sample  Xj,. 

partition  of  Xj,...,X^  determined  by  Aj Aj^ 

usually  referred  to  as  clusters.  Concentrating  on 
partitioning  Xj,...,X^  seems  warranted,  for  example, 

in  clustering  problems  arising  in  paleontology  studies 
where  new  observations  are  not  expected.  However, 
in  medical  situations,  such  as  trying  to  cluster  the 
types  of  shock  for  emergency  care  purposes,  the  statis- 
tician is  interested  in  the  performance  of  the  algo- 
rithm on  future  observations.  Our  model  is  directed 
toward  this  type  of  situation. 

Referring  to  Figure  1 there  are  three  natural 
clusters  but  the  algorithm  with  the  sample  Xj,...,X^^ 

has  yielded  four  clusters  in  (a),  three  in  (b)  and 
two  in  (c).  How  does  one  measure  the  performance  of 
the  algorithm  on  future  observations?  Agreeing  that 
what  we  call  each  cluster  Cj  is  unimportant  as  long  as 

we  give  one  unique  label  to  each  we  see  that  the 
probability  of  misclassification  becomes 


M f. 
in  £ 1.  . f 
g i = l ’ ■'a' 


g{i) 


f^(x)dx 


(3) 


where  the  minimum  is  taken  over  all  one-to-one  func- 
tions g:  (1,..,M)  ->  n , . . . ,Max(M,L)l  and,  if  M > L, 
we  put  Aj^^j  = . . . A„ 


"M 


In  particular,  if  is 


contained  in  some  Aj  and  each  Aj  contains  at  most  one 

C.  then  L = 0.  It  should  be  stressed  that  L_  is  a 
1 n n 

random  variable  which  depends  on  X^,...,X^  and  whose 

value  is  just  the  frequency  of  observations  misclassi- 
fied  when  a large  number  of  new  observations  are 
classified  with  the  partition  Aj A^^. 

Our  interest  here  is  finding  what  properties  are 
necessary  for  clustering  algorithms  to  insure  that 
L^  » 0 with  probability  one.  The  following  clustering 

algorithm,  a version  of  the  familiar  single-linkage 
algorithms,  has  this  property  with  some  slight  addi- 
tional assumptions  on  f.  More  extensive  results  for 
other  algorithms  and  assumptions  may  be  found  in 
Oevroye  and  Wagner^. 

If  r > 0 connect  the  two  points  *.(,Xj.  if 
d(Xj,Xj)  ■ r,  I < i,j  < n.  Call  two  points  X|^,X^ 
connected  if  there  exists  a sequence  ^nom 

'>^1 Xn>  'O  = 'k-  ^m 


»e* 


connected,  I < i < m.  The  set  fXj,...,X^)  is  then 
partitioned  into  connected  subsets  Kj,...,K|^.  A 

partition  Aj,...,A|^  of  R'*  is  obtained  from  Kj 

by  putting  the  point  xcR*^  into  Aj  if  the  closest 
point  to  X from  Xj X^  is  in  kj  (ties  are  broken 


321 


L 


1 ^ 


arbitrarily). 


Theorem.  It  r » r satisfies 
n 


(i)  nr^  /log  n 

(ii) 


and  if,  for  some  a,b  > 0, 


inf  f 
M •'S(x,p 


f{x)dx  > ap‘*,  0 < p < b. 


where  S(x,p)  is  the  sphere  centered  at  x with  radius 
p,  then 

-►  0 w.p.l. 

Proof.  We  recall  that  the  support  C of  a density  f is 
the  smallest  closed  set  with  the  property  that 

/ f(x)dx  = 1.  In  particular,  the  C.  are  closed  sets. 
•'C  ’ 

Because  the  are  bounded. 


xcC,..yeC. 


|x-y|l  > 6 > 0 


whenever  i ^ j.  We  assume  that  n is  so  large  that 
r^  < 6 (use  (4(ii))).  Suppose  that  Xj,...,X^  is  such 

that  every  sphere  S(x,r  /3)  contains  at  least  one  of 
M 

the  X . for  xeC  = U C, . 

’ 1 ' 

If  we  can  show  that 

(i)  whenever  C.  n Aj  ^ ♦ and  X^^eC^, 
then  X^cAj,  and 


(ii)  whenever  n Aj  ^ ♦ and 


then  Xj^^Aj, 


then  we  know  that 


M =•  L and  U C,  * U (C<  n A„ 

1*1  ’ 1*1  ’ 9(1) 


for  some  one-to-one  mapping  g:  II,.  ...H)  -►  (1 M) , 

which  in  turn  implies  that 

° - '■n  - 5 ”1,4'=  ° ° 

9(1) 


PIL„  > 0)  < Plinf  u„(S(x.r  /3))  * 0)  (6) 

" * xcC  " " 

where  is  the  empirical  measure  for  Xj,...,X^. 

Let  us  now  prove  (1)  and  (ii).  Property  (11)  is 
M 

trivial  since  r < 4 and  X,t  U C,  for  all  j with 
n ^ 1 ^ 

probability  one.  For  property  (i),  we  need  only  show 
that  for  any  x in  C, , and  any  X,cC,,  there  exists  a 

sequence  rj,...,Y  from  Xj X with  Yj  * x'  ', 


\ = ''j>  ''^Xk+l  - ^11  i ''n-  1 1 X < t,  where  X^^’ 
is  the  nearesfvighbor  to  x among  Xj X^. 

Since  S(x,r^/3)  contains  one  X|^,  and  since  r^<  6, 

we  know  that  X^^^  belongs  to  C.  as  well,  no  matter 
what  X is  picked  in  C..  By  the  connectedness  of  C-, 


we  can  find  {Xj x^)  9 c.  with  Xj  = X'  ',  x^  * X^, 

and||X|^^j  - X|^||  < r^/3,  1 < k < t.  Thus,  since 
every  S(x^,r^/3)  contains  one  of  the  Xj^'s,  we  know 
that  there  are  Y|^eS(X|^,r^/3) , 1 < k < t , Yj  = X^  , 

Y.  = X..  Also,  HY.,.,  - Y,  II  < IlY,  - X.  .,11  + 


Y^  = Xj.  Also,  IlY^^j  - Y^l 

■ ’‘kll  l!^k  ■ ''kll  ■ 

proof  of  ( i ) . 


This  concludes  the 


As  for  (6),  because  the  are  bounded,  we  can 
find  a grid  (yj,...,y|^)  CC  with  the  property  that  for 
every  xcC  there  exists  an  y.  with(|y^  - x||  < r^/12. 
Such  a grid  contains  at  most  points  where  y > 0 

is  a constant  depending  upon  d,  ||•1|,  and  the  diameter 
of  C.  If  r^/12  < b,  then 

inf  f f(z)dz  > inf  f f(z)dz 

1  •'S(y^,r^/6)  ■ xeC  •'S(x,r^/12) 

(\Y 

i “Vw  ■ 

Also,  if  inf  u„(S(x ,r  /3) ) = 0,  then  u„(S(y . ,r  /6) ) =0 
xrC  n 1 n 

for  all  i so  that 

P{L„  > 0)  < Plinf  u (S(x,r  /3))  = 0} 

" ■ xeC  " " 

< P(u„(S(y,,r  /6))  = 0) 

■ i*l  " ^ " 

<(y/rj^\(l-  inf  f f(z)dzV 

■ \ " / V M •'S(x,r/12)  ' 

X€  U C, 


d / , nd 
-anr_  /12 
n 


By  the  Borel-Cantelli  lemma  and  (4)(i),  we  have  that 
tPIL  > 0}  < ”,  completing  the  proof  of  the  Theorem. 
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